Degraded document image enhancement
نویسندگان
چکیده
Poor quality documents are obtained in various situations such as historical document collections, legal archives, security investigations, and documents found in clandestine locations. Such documents are often scanned for automated analysis, further processing, and archiving. Due to the nature of such documents, degraded document images are often hard to read, have low contrast, and are corrupted by various artifacts. We describe a novel approach for the enhancement of such documents based on probabilistic models which increases the contrast, and thus, readability of such documents under various degradations. The enhancement produced by the proposed approach can be viewed under different viewing conditions if desired. The proposed approach was evaluated qualitatively and compared to standard enhancement techniques on a subset of historical documents obtained from the Yad Vashem Holocaust museum. In addition, quantitative performance was evaluated based on synthetically generated data corrupted under various degradation models. Preliminary results demonstrate the effectiveness of the proposed approach.
منابع مشابه
An Improved Contrast Image Based Document Image Binarization Technique for Degraded Document Images
Document Image Binarization converts a gray-scale document image into binary document image .It is usually performed in the pre-processing stage of document image analysis and it aims to segment the foreground text from the document background. Segmentation of foreground text from the document background is a difficult task in the case of degraded document images. In this paper we propose a sim...
متن کاملAncient Document Images Enhancement Using Phase Based Binarization
In this paper, we present a phase-based binarization model for degraded document images, also a post processing method that can improve any binarization method and a ground truth generation tool. Usually, many binarization techniques are implemented in the literature for different types of binarization problems. It include an adaptive image contrast based document image binarization technique t...
متن کاملRobust binarization of degraded document images using heuristics
Historically significant documents are often discovered with defects that make them difficult to read and analyze. This fact is particularly troublesome if the defects prevent software from performing an automated analysis. Image enhancement methods are used to remove or minimize document defects, improve software performance, and generally make images more legible. We describe an automated, im...
متن کاملOptimizing OCR accuracy for bi-tonal, noisy scans of degraded Arabic documents
Acquiring foreign language from degraded hardcopy documents is of interest to military and border control applications. Bi-tonal image scans are desirable because file size is small. However, the nature of hardcopy degradations and the scanner or image enhancement software capabilities used directly affect the quality of the captured image and the extent of language acquisition. We applied a co...
متن کاملExtraction of Original Text Document from a Set of Degraded Text Documents from the Same Source
Information extraction is the task of extracting structured data from a degraded document. It includes data extraction such as text, image or graphics from the sources such as an image, video or documents. Text detection and extraction from the degraded document finds application in wide range of study. In this paper, an Optical Character Recognition less (OCR-less) method of obtaining an origi...
متن کامل